Dike: Revisiting Resource Management for Distributed Deep Learning

نویسندگان

  • Erci Xu
  • Mohit Saxena
  • Feng Qin
چکیده

The recent adoption of deep learning for diverse applications has required scaling infrastructures both horizontally and vertically. As a result, efficient resource management for distributed deep learning (DDL) frameworks is becoming increasingly important. However, existing techniques for scaling DDL applications rely on general-purpose resource managers originally designed for data intensive applications. In contrast, DDL applications present unique challenges for resource management as compared to traditional big data frameworks, such as a different master-slave communication paradigm, deeper ML models that are more computationally and network bound than I/O, and use of heterogeneous resources (GPUs, TPUs, and variable memory). In addition, most DDL frameworks require data scientists to manually configure the task placement and resource assignment to execute DDL models. In this paper, we present Dike, an application scheduler framework that transparently makes scheduling decisions for placement and resource assignment to DDL workers and parameter servers, based on the unique characteristics of the DDL model (number and type of parameters and neural network layers), node heterogeneity (CPU/GPU ratios), and input dataset. We have implemented Dike as a resource manager for DDL jobs in Tensorflow on top of Apache Mesos. We show that Dike significantly outperforms both manual and static assignment of resource offers to Tensorflow tasks, and achieves at least 95% of the optimal throughput for different DDL models such as ResNet, Inception.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Virtual Dike and Flood Simulator: Parallel distributed computing for flood early warning systems

The paper presents two simulation modules –Virtual Dike and Flood Simulator– developed for flood Early Warning Systems (EWS). The UrbanFlood EWS is a distributed system that (1) analyzes sensor data received in real-time from flood defenses (dikes, dams, etc.) and (2) simulates dike stability, breaching and flood propagation. Computational modules are invoked by workflow-based expert scenarios ...

متن کامل

IJASCSE, Volume 2, Special Issue 2, 2013

The aim of this paper is to provide a description of deep-learning-based scheduling approach for academicpurpose high-performance computing systems. Academicpurpose distributed computing systems’ (DCS) share reaches 17.4% amongst TOP500 supercomputer sites (15.6% in performance scale) that make them a valuable object of research. The core of this approach is to predict the future workflow of th...

متن کامل

Global Warming: New Frontier of Research Deep Learning- Age of Distributed Green Smart Microgrid

The exponential increase in carbon-dioxide resulting Global Warming would make the planet earth to become inhabitable in many parts of the world with ensuing mass starvation. The rise of digital technology all over the world fundamentally have changed the lives of humans. The emerging technology of the Internet of Things, IoT, machine learning, data mining, biotechnology, biometric, and deep le...

متن کامل

Revisiting Distributed Synchronous SGD

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony. In contrast, the synchronous approach is often thought to be impractical due to idle time wasted on waiting for straggling workers. We revisit these conventional bel...

متن کامل

The Sungun porphyry magma resource and the 120,000-year difference in age between the main stock and the first dike: New evidence from 87Sr/86Sr, 143Nd/144Nd and Pb, SHRIMP U–Pb zircon dating in NW Iran

The Sungun  copper  porphyry deposit is hosted by a Tertiary magmatic complex in  the  Azarbayjan province , northwestern Iran. The  Sungan mine  in its southern and eastern parts is limited  by early Miocene volcanic and by Late Cretaceous limestone rocks in northern and eastern parts respectively . The Sungun  deposit is associated with a suite of porphyritic granitoids and late dikes intrudi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017